Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA

نویسندگان

  • Sungkyu Jung
  • Arusharka Sen
  • J. S. Marron
چکیده

In High Dimension, Low Sample Size (HDLSS) data situations, where the dimension d is much larger than the sample size n, principal component analysis (PCA) plays an important role in statistical analysis. Under which conditions does the sample PCA well reflect the population covariance structure? We answer this question in a relevant asymptotic context where d grows and n is fixed, under a generalized spiked covariance model. Specifically, we assume the largest population eigenvalues to be of the order d, where α <, =, or > 1. Earlier results show the conditions for consistency and strong inconsistency of eigenvectors of the sample covariance matrix. In the boundary case, α = 1, where the sample PC directions are neither consistent nor strongly inconsistent, we show that eigenvalues and eigenvectors do not degenerate but have limiting distributions. The result smoothly bridges the phase transition represented by the other two cases, and thus gives a spectrum of limits for the sample PCA in the HDLSS asymptotics. While the results hold under a general situation, the limiting distributions under Gaussian assumption are illustrated in greater detail. In addition, the geometric representation of HDLSS data is extended to give three different representations, that depend on the magnitude of variances in the first few principal components. ∗Corresponding author Email addresses: [email protected] (Sungkyu Jung), [email protected] a.ca (Arusharka Sen), [email protected] (J. S. Marron) Preprint submitted to Journal of Multivariate Analysis March 16, 2012

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds

SUNGKYU JUNG: Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds. (Under the direction of Dr. J. S. Marron.) The dissertation consists of two research topics regarding modern non-standard data analytic situations. In particular, data under the High Dimension, Low Sample Size (HDLSS) situation and data lying on manifolds are analyzed. These situations are rela...

متن کامل

Pca Consistency in High Dimension , Low Sample Size Context

Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows (i.e. High Dimension, Low Sample Size (HDLSS)) are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLS...

متن کامل

Geometric representation of high dimension, low sample size data

High dimension, low sample size data are emerging in various areas of science. We find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to 1 while the sample size is fixed. Our analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex. Essentially all the randomness in the data appears o...

متن کامل

Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Sparse Principal Component Analysis (PCA) methods are efficient tools to reduce the dimension (or number of variables) of complex data. Sparse principal components (PCs) are easier to interpret than conventional PCs, because most loadings are zero. We study the asymptotic properties of these sparse PC directions for scenarios with fixed sample size and increasing dimension (i.e. High Dimension,...

متن کامل

Asymptotic Properties of Distance-Weighted Discrimination

While Distance-Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced data sets. In the case of unequal costs, biased sampling or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD. A major contribution of this paper is the development of optimal weighting schemes for various no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Multivariate Analysis

دوره 109  شماره 

صفحات  -

تاریخ انتشار 2012